Skip to main content

Object Detection

Description

The object detection tool is a powerful feature that enables the user to identify and locate various objects within images. It utilises a pre-trained model, which has been trained on a vast amount of data to localise and classify different objects.

By applying this tool to an image, the user can automatically localise and identify objects present in the image. The object detection tool offers a convenient and efficient way to perform object recognition tasks without the need for manual intervention or extensive personnel training.

Settings

Model Folder Path

The model folder path is the path to the folder containing the model files. The model files are the files that are used to detect objects in the images. The model folder path can be a local path or a remote path.

See Model Folder Structure for more information about how to structure the model folder.

info

Supported Architectures

Here are the supported frameworks and architectures:

TensorFlow 1
TensorFlow 1 architectures
ArchitectureBackboneInput SizeModel File Type
Faster R-CNNInception V21000x1000.pb & .pbtxt
Faster R-CNNResNet501000x1000.pb & .pbtxt
SSDInception V2300x300.pb & .pbtxt
SSDMobileNet V2300x300.pb & .pbtxt
SSDLiteMobileNet V2300x300.pb & .pbtxt
DarkNet
DarkNet architectures
ArchitectureBackboneInput SizeModel File Type
YOLO Tiny V3DarkNet416x416.weights & .cfg
YOLO V3DarkNet416x416.weights & .cfg
YOLO V4CSPDarkNet416x416.weights & .cfg
TensorFlow 2 - ONNX
TensorFlow 2 architectures
ArchitectureBackboneInput SizeModel File Type
CenterNetHourglass104512x512.onnx
CenterNetHourglass1041024x1024.onnx
EfficientDet-D0EfficientNet-B0512x512.onnx
EfficientDet-D1EfficientNet-B1640x640.onnx
EfficientDet-D2EfficientNet-B2768x768.onnx
EfficientDet-D3EfficientNet-B3896x896.onnx
EfficientDet-D4EfficientNet-B41024x1024.onnx
EfficientDet-D5EfficientNet-B51280x1280.onnx
EfficientDet-D6EfficientNet-B61280x1280.onnx
EfficientDet-D7EfficientNet-B71536x1536.onnx
SSDMobileNet V2320x320.onnx
SSDMobileNet V2640x640.onnx
SSDMobileNet V21024x1024.onnx
Faster R-CNNResNet50640x640.onnx
Faster R-CNNResNet501024x1024.onnx
Faster R-CNNResNet501333x800.onnx
MMYolo
MMYolo architectures
ArchitectureBackboneInput SizeModel File Type
YOLOv5-nCSPDarkNet (P5)640x640.onnx
YOLOv5-sCSPDarkNet (P5)640x640.onnx
YOLOv5-mCSPDarkNet (P5)640x640.onnx
YOLOv5-lCSPDarkNet (P5)640x640.onnx
YOLOv5-xCSPDarkNet (P5)640x640.onnx
YOLOv5-nCSPDarkNet (P6)1280x1280.onnx
YOLOv5-sCSPDarkNet (P6)1280x1280.onnx
YOLOv5-mCSPDarkNet (P6)1280x1280.onnx
YOLOv5-lCSPDarkNet (P6)1280x1280.onnx
YOLOv6-nEfficientRep (P5)640x640.onnx
YOLOv6-tEfficientRep (P5)640x640.onnx
YOLOv6-sEfficientRep (P5)640x640.onnx
YOLOv6-mEfficientRep (P5)640x640.onnx
YOLOv6-lEfficientRep (P5)640x640.onnx
YOLOv7-tinyCSPDarknet53 (P5)640x640.onnx
YOLOv7-lCSPDarknet53 (P5)640x640.onnx
YOLOv7-xCSPDarknet53 (P5)640x640.onnx
YOLOv7-wCSPDarknet53 (P6)1280x1280.onnx
YOLOv7-dCSPDarknet53 (P6)1280x1280.onnx
YOLOv8-nCSPDarknet53+PANet (P5)640x640.onnx
YOLOv8-sCSPDarknet53+PANet (P5)640x640.onnx
YOLOv8-mCSPDarknet53+PANet (P5)640x640.onnx
YOLOv8-lCSPDarknet53+PANet (P5)640x640.onnx
YOLOv8-xCSPDarknet53+PANet (P5)640x640.onnx
YOLOX-tSwin Transformer640x640.onnx
YOLOX-sSwin Transformer640x640.onnx
YOLOX-mSwin Transformer640x640.onnx
YOLOX-lSwin Transformer640x640.onnx
YOLOX-xSwin Transformer640x640.onnx
PPYOLOE+-sPVTv2640x640.onnx
PPYOLOE+-mPVTv2640x640.onnx
PPYOLOE+-lPVTv2640x640.onnx
PPYOLOE+-xPVTv2640x640.onnx
RTMDet-tinyResNet-FPN640x640.onnx
RTMDet-sResNet-FPN640x640.onnx
RTMDet-mResNet-FPN640x640.onnx
RTMDet-lResNet-FPN640x640.onnx
RTMDet-xResNet-FPN640x640.onnx
MMDetection
MMDetection architectures
ArchitectureBackboneInput SizeModel File Type
RTMDet-tinyResNet-FPN640x640.onnx
RTMDet-sResNet-FPN640x640.onnx
RTMDet-mResNet-FPN640x640.onnx
RTMDet-lResNet-FPN640x640.onnx
RTMDet-xResNet-FPN640x640.onnx
YOLOX-tSwin Transformer640x640.onnx
YOLOX-sSwin Transformer640x640.onnx
YOLOX-mSwin Transformer640x640.onnx
YOLOX-lSwin Transformer640x640.onnx
YOLOX-xSwin Transformer640x640.onnx
CenterNetResNet-181024x1024.onnx
CenterNetResNet-501024x1024.onnx
CenterNetResNet-1011024x1024.onnx
CenterNetResNet-501088x800.onnx
Faster R-CNNR-50-C4 (Caffe)1088x800.onnx
Faster R-CNNR-50-DC5 (Caffe)1088x800.onnx
Faster R-CNNR-50-FPN (Caffe)1088x800.onnx
Faster R-CNNR-50-FPN (PyTorch)1088x800.onnx
Faster R-CNNR-101-FPN (Caffe)1088x800.onnx
Faster R-CNNR-101-FPN (PyTorch)1088x800.onnx
Faster R-CNNX-101-32x4d-FPN (PyTorch)1088x800.onnx
Faster R-CNNX-101-64x4d-FPN (PyTorch)1088x800.onnx
Swin (Mask R-CNN)Swin-T1088x800.onnx
Cascade R-CNNR-50-FPN (Caffe)1088x800.onnx
Cascade R-CNNR-50-FPN (PyTorch)1088x800.onnx
Cascade R-CNNR-101-FPN (Caffe)1088x800.onnx
Cascade R-CNNR-101-FPN (PyTorch)1088x800.onnx
Cascade R-CNNX-101-32x4d-FPN (PyTorch)1088x800.onnx
Cascade R-CNNX-101-64x4d-FPN (PyTorch)1088x800.onnx
Others
Others architectures
ArchitectureBackboneInput SizeModel File Typelink
YOLO-WorldDarkNet640x640.onnxlink
tip

Zene UI

picture 1

Setting Example

The user can choose the model folder path by clicking on the folder icon and selecting the folder containing the model files (see File Explorer for more information).

In this example, the model folder is located at D:/ontf2_faster_rcnn_resnet50_v1_640x640. Below the model path selector the user can see the Model Name (folder name), and the model Framework.

Architecture

The architecture is the type of model that will be used to detect objects in the images. See Supported Architecture for more information.

NMS Threshold

picture 16

Example - 0.01 NMS

picture 16

Example - 0.99 NMS

The NMS threshold is the minimum threshold for non-maximum suppression. Non-maximum suppression is a technique used to reduce the number of bounding boxes by removing the ones that overlap too much with other bounding boxes. A lower NMS threshold will result in fewer bounding boxes, but with potentially higher accuracy, whereas a higher threshold will result in more bounding boxes, but with lower accuracy.

Confidence Threshold

picture 16

Example - 0.01 Confidence

picture 16

Example - 0.99 Confidence

The confidence threshold is the minimum confidence score a detection should have to be considered valid. Detections with confidence scores below this threshold will be discarded. A higher confidence threshold will result in fewer detections, but with higher accuracy, whereas a lower threshold will result in more detections, but with potentially lower accuracy.

Split Frame

Whether to split the frame into multiple frames before detecting objects. This is useful when the image is too large to be processed by the model.

Transform Contains

A list of configurations for transforming a detected primary object into another object based on whether they are other objects within the primary object. For example, if the user wants to transform the detected person object into a group object when there are multiple person objects within the primary person object, the user can add a configuration to the transform contains list.

See Detected Object Transformation Contain for more information.

Transform Labels

A list of class labels' original names and their transformed names. All the labels are inherited from after transforming contained objects.

Keep Labels

The keep labels are the labels that will be kept when detecting objects. For example, if the user want to keep only the person label, the user can add only person to the keep labels list.

Ignore Labels

The ignored labels are the labels that will be ignored when detecting objects. For example, if the user want to ignore the person label, the user can add person to the ignored labels list.

Advanced Settings

Whether to enabled advanced settings. The advanced settings allow the user to have more control over the model parameters.

caution

Please use the advanced settings with caution as changing the parameters could result in unexpected behaviour.

To RGB

Advanced Settings

Convert from BGR colours (default in Zene) to RGB colours.This is useful when using models trained on images with different channel orders.

Scale

Advanced Settings

The scale is the scale factor that is applied to the input image. The scale factor is used to normalise the input image before it is passed to the model. The scale factor is calculated by dividing the input image by the scale factor. A higher scale factor will result in a smaller input image, while a lower scale factor will result in a larger input image.

Mean

Advanced Settings

The mean is the mean value that is subtracted from the input image. The mean value is used to normalise the input image before it is passed to the model. The mean value is calculated by subtracting the input image by the mean value. A higher mean value will result in a darker input image, while a lower mean value will result in a brighter input image.

Input Pixel (width and height)

Advanced Settings

The input pixel is the size of the input image. The input image is the image that is passed to the model. The user can specify the width and height of the input image.

Display Results

Overlay Results

Whether to draw the results on top of the image frame.